SE Minneapolis , MN 55455 - 0159 USA TR 08 - 022 Bayesian Co - clustering
نویسندگان
چکیده
In recent years, co-clustering has emerged as a powerful data mining tool that can analyze dyadic data connecting two entities. However, almost all existing co-clustering techniques are partitional, and allow individual rows and columns of a data matrix to belong to only one cluster. Several current applications, such as recommendation systems and market basket analysis, can substantially benefit from a mixed membership of rows and columns. In this paper, we present Bayesian co-clustering (BCC) models, that allow a mixed membership in row and column clusters. BCC maintains separate Dirichlet priors for rows and columns over the mixed membership and assumes each observation to be generated by an exponential family distribution corresponding to its row and column clusters. We propose a fast variational algorithm for inference and parameter estimation. The model is designed to naturally handle sparse matrices as the inference is done only based on the non-missing entries. In addition to finding co-cluster structure in observations, the model outputs a low dimensional co-embedding, and accurately predicts missing values in the original matrix. We demonstrate the efficacy of the model through experiments on both simulated and real data.
منابع مشابه
SE Minneapolis , MN 55455 - 0159 USA TR 08 - 042 Infobionics Server - the next generation database
This paper describes the ‘Infobionics Server’ a next generation database. Also referred to as the ‘Cellular Database Server’, that is based on a novel ‘cellular’ data model.
متن کاملDepartment of Computer Science and Engineering University of Minnesota 4 - 192 EECS Building 200 Union Street SE Minneapolis , MN 55455 - 0159 USA TR 04 - 002 Enhancing location service scalability with HIGH - GRADE
ÄÓ BLOCKIN
متن کاملSmaller is tougher
Smaller is tougher A.R. Beaber a , J.D. Nowak b , O. Ugurlu c , W.M. Mook d , S.L. Girshick e , R. Ballarini f & W.W. Gerberich a a Department of Chemical Engineering and Materials Science, University of Minnesota, 421 Washington Ave SE, Minneapolis, MN 55455, USA b Hysitron Incorporated, 10025 Valley View Road, Minneapolis, Minnesota 55344, USA c Characterization Facility, University of Minnes...
متن کاملSmall size strength dependence on dislocation nucleation
J.D. Nowak, A.R. Beaber, O. Ugurlu, S.L. Girshick and W.W. Gerberich* Hysitron Incorporated, 10025 Valley View Road, Minneapolis, MN 55344, USA Department of Chemical Engineering and Materials Science, University of Minnesota, 421 Washington Ave SE, Minneapolis, MN 55455, USA Characterization Facility, University of Minnesota, Minneapolis, MN 55455, USA Department of Mechanical Engineering, Uni...
متن کاملDepartment of Computer Science and Engineering University of Minnesota 4 - 192 EECS Building 200 Union Street SE Minneapolis , MN 55455 - 0159 USA TR 04 - 021 gCLUTO – An Interactive Clustering , Visualization , and Analysis System
Recently published studies have shown that partitional clustering algorithms that optimize certain criterion functions, which measure key aspects of interand intra-cluster similarity, are very effective in producing hard clustering solutions for document datasets and outperform traditional partitional and agglomerative algorithms. In this paper we study the extent to which these criterion funct...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2008